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Abstract 

We  present  a distributed  model-based  diagnostics 
architecture  for  embedded  diagnostics.  We  extend 
the  traditional  model-based  definition  of  diagnosis 
to  a distributed  diagnosis  definition,  in  which  we 
have  a collection  of  distributed  components  whose 
interconnectivity  is  described  by  a directed  graph. 
Assuming  that  each  component  can  compute  a local 
minimal  diagnosis  based  only  on  sensors  internal 
to  that  component  and  knowledge  only  of  its  own 
system  description,  we  describe  an  algorithm  that 
guarantees  a globally  sound,  complete  and  minimal 
diagnosis  for  the  complete  system.  By  compiling 
diagnoses  for  groups  of  components  based  on  the 
interconnectivey  graph,  the  algorithm  efficiently 
synthesizes  the  local  diagnoses  computed  in  dis- 
tributed components  into  a globally-sound  system 
diagnosis  using  a graph-based  message-passing  ap- 
proach. 

1 INTRODUCTION 

This  article  proposes  a new  technique  for  diagnosing  dis- 
tributed systems  using  a model-based  approach.  We  assume 
that  we  have  a system  consisting  of  a set  of  inter-connected 
components,  each  of  which  computes  a local  (component)  di- 
agnosis.1 We  adopt  the  structure-based  diagnosis  framework 
of  Darwiche  [8]  for  synthesizing  component  diagnoses  into 
globally-sound  diagnoses,  where  we  obtain  the  structure  from 
the  component  connectivity.  Unlike  previous  approaches  that 
compute  diagnoses  using  the  system  observations  and  a sys- 
tem description  [8;  10],  we  transform  the  component  diagno- 
sis synthesis  into  the  space  of  minimal  diagnoses.  Assum- 
ing that  each  component  can  compute  a local  minimal  diag- 
nosis based  only  on  sensors  internal  to  that  component  and 
knowledge  only  of  the  component  system  description,  we  de- 
scribe an  algorithm  that  guarantees  a globally  sound,  com- 
plete and  minimal  diagnosis  for  the  complete  system.  This 

‘Research  supported  in  part  by  The  Office  of  Naval  Research 
under  contract  number  N000 14-98-3-001 2. 

'Note  that  one  can  compute  component  diagnoses  using  any 
method  which  returns  a minimal  diagnosis  (with  respect  to  a speci- 
fied minimality  criterion). 


algorithm  uses  as  input  the  directed  graph  (digraph)  describ- 
ing the  connectivity  of  distributed  components, with  arc  di- 
rectionality derived  from  the  causal  relation  between  the  the 
components.  Given  that  real-world  graphs  of  this  type  are 
either  tree-structured  or  can  be  converted  to  tree-structured 
graphs,  we  propose  a graph-based  message-passing  algorithm 
which  passes  diagnoses  as  messages  and  synthesizes  local  di- 
agnoses into  a globally  minimal  diagnosis  in  a two-phase  pro- 
cess. By  compiling  diagnoses  for  collections  of  components 
(as  determined  by  the  graph’s  topology),  we  can  significantly 
improve  the  performance  of  distributed  embedded  systems. 
We  show  how  this  approach  can  be  used  for  the  distributed 
diagnosis  of  systems  with  arbitrary  topologies  by  transform- 
ing such  topologies  into  trees. 

One  important  point  to  stress  is  that  this  approach  synthe- 
sizes diagnoses  computed  locally,  and  places  no  restriction  on 
the  technique  used  to  compute  each  local  diagnosis  (e.g.,  neu- 
ral network,  Bayesian  network,  etc.),  provided  that  each  local 
diagnosis  is  a least-cost  or  most-likely  diagnosis.  The  syn- 
thesis approach  takes  this  set  of  self-diagnosing  sub-systems, 
together  with  the  connectivity  of  these  sub-systems,  to  com- 
pute globally-consistent  diagnoses. 

The  approach  presented  in  this  article  assumes  that  all 
faults  are  diagnosable  (i.e.,  can  be  isolated)  through  a central- 
ized algorithm.  We  examine  whether  a distributed  approach 
can  diagnose  all  faults,  since  a distributed  algorithm  can  iso- 
late faults  no  better  than  a centralized  algorithm.  Issues  re- 
lating to  restricted  diagnosability  of  both  centralized  and  dis- 
tributed algorithms  due  to  insufficient  observable  data  (e.g., 
when  the  suite  of  sensors  is  insufficient  to  guarantee  complete 
diagnosability)  are  examined  in  [21]. 

This  article  is  organized  as  follows.  Section  2 introduces 
the  application  model  that  we  use  to  demonstrate  our  ap- 
proach. Section  3 introduces  our  modeling  formalism,  and 
specifies  our  notion  of  centralized  and  distributed  model. 
Section  4 describes  how  we  diagnose  distributed  models. 
Section  5 surveys  some  related  work  on  this  topic.  We  sum- 
marize our  conclusions  in  Section  6. 

2 IN-FLIGHT  ENTERTAINMENT 
EXAMPLE 

Throughout  this  article  we  use  a simplified  example  of  an 


In-Flight  Entertainment  (IFE)  system.  Figure  1 shows  the 
schematic  for  an  IFE  system  fragment  where  we  have  (1)  a 
transmitter  module  (Tx)  that  generates  10  movie  channels 
(consisting  of  both  video  and  audio  signals)  and  1 0 audio 
channels;  (2)  two  area  distribution  boxes  (ADB);  and  (3)  at- 
tached to  each  ADB.,  we  have  two  passenger  units,  Pt  \ and 
Pi2 . For  ADB  j,  passenger  i,  i = 1,  2 has  a controller  Cj ,• 
for  selecting  a video  or  audio  channel,  plus  an  audio  unit  a , 
and  video  display  Vi.  Control  signal  Cji  is  sent  by  passenger 
i to  ADBj  and  then  to  the  transmitter,  which  in  turn  sends  an 
RF  signal  (RF)  to  each  passenger. 

We  adopt  a notion  of  causal  influence  for  describing  how 
different  components  affect  the  value  of  a signal  as  it  propa- 
gates through  the  system.  For  example,  the  RF  signal  causally 
influences  the  passenger  audio  and  video  outputs.  In  this 
model  the  observables  are  the  control  signals,  plus  for  pas- 
senger i downstream  of  ADBj  sound  (Sjf  and  video-display 
(V Dji ).  We  assign  a fault-mode  to  the  transmitter  and  to  each 
ADB  and  passenger  unit. 


Figure  1 : Schematic  of  IFE  fragment,  showing  the  main  mod- 
ules and  the  directed  arcs  of  data-flows. 

Our  modeling  approach  makes  the  following  assumptions. 
First,  we  can  specify  a system  using  an  object-oriented  ap- 
proach. In  other  words,  a system  can  be  defined  as  a col- 
lection of  components,  which  are  connected  together,  e.g., 
physically,  as  in  an  HVAC  system,  or  in  terms  of  data  trans- 
mission/reception, as  in  the  IFE  example.  Our  primary  com- 
ponent consists  of  a block,  which  has  properties:  input  set, 
output  set,  fault-mode,  and  equations.  Given  the  fault-mode 
and  input  set,  the  equations  provide  a mapping  to  the  output 
set.  In  other  words,  the  inputs  are  the  only  nodes  with  causal 
arcs  into  the  block,  and  the  outputs  are  the  only  nodes  with 
causal  arcs  out  of  the  block.  Typically,  we  have  causal  depen- 
dence of  block  outputs  u>i  on  inputs  ti,  i.e.  u>i  oc 

This  distributed  model  consists  of  a set  of  sub-models,  or 
blocks,  which  may  be  connected  together.  In  our  IFE  exam- 
ple, the  transmitter  block  has  inputs  of  control  signals  C i and 
C2,  and  output  an  RF  signal. 

Second,  we  assume  that  each  component  computes  diag- 

‘The  causal  function  oc  can  be  be  generalized  to  include  proposi- 
tions, relations,  probabilistic  functions,  qualitative  differential  equa- 
tions, etc.  We  don’t  address  such  a generalization  here. 


noses  based  on  data  local  to  the  component.  We  do  not  place 
any  restrictions  on  the  type  of  algorithm  used  to  compute  the 
diagnosis,  except  that  the  diagnosis  be  a least-cost  diagno- 
sis. We  will  describe  the  cost  function  used  by  our  synthesis 
algorithm  in  the  following  section. 

3 MODEL-BASED  DIAGNOSTICS  USING 
CAUSAL  NETWORKS 

This  section  formalizes  our  modeling  and  inference  approach 
to  diagnostics  and  control  reconfiguration.  We  first  introduce 
the  model-based  formalism,  and  then  extend  these  notions  to 
capture  a distributed  model-based  formalism. 

3.1  FLAT  (CENTRALIZED)  MODELS 

We  adopt  and  extend  the  model-based  representation  for 
diagnosis  of  Darwiche  [8],  We  model  the  system  using  a 
causal  network: 

Definition  1 A system  description  is  a four-tuple  $ = 

(V,  Q,  S),  where 

• V is  a set  of  variables  comprising  two  variable  types: 
A is  a set  of  variables  (called  assumables)  representing 
the  failure  modes  of  the  components,  V is  a set  of  non- 
assumable  variables  (V  fl  A = 0)  representing  system 
properties  other  than  failure  modes ; 

• Q is  a directed  acyclic  graph  (DAG)  called  a causal 
structure  whose  nodes  are  members  in~VUA  and  whose 
directed  arcs  represent  causal  relations  between  pairs  of 
nodes; 

• and  S is  a set  of  propositional  sentences  (called  the  do- 
main axioms)  constructed from  members  in  VU  A based 
on  the  topological  structure  of  Q. 

This  definition  of  system  description  differs  from  the  stan- 
dard definition  (called  SD  in  [22])  only  in  that  we  include 
a graph  Q to  complement  the  domain  axioms  set  of  failure 
modes  (commonly  called  COMPS)  and  non-assumable  vari- 
ables. 

The  set  of  non-assumable  variables  consists  of  two  exclu- 
sive subsets:  V0bs  (the  set  of  observables)  and  VUnobs  (the 
set  of  unobservables). 

We  can  capture  structural  properties  of  the  system  descrip- 
tion using  the  directed  acyclic  graph,  or  DAG,  Q. 3 For  exam- 
ple, if  an  actuator  determines  if  a motor  is  on  or  not,  we  say 
that  the  actuator  causally  influences  the  motor.  More  gener- 
ally, A may  directly  causally  influence  B if  A is  a predecessor 
of  B in  Q.  We  use  .0  ex  A to  denote  the  direct  causal  influence 
of  the  value  of  B by  the  value  of  A.4  Through  transitivity,  we 
can  deduce  indirect  causal  influence.  For  example,  if  B oc  A 
and  C oc  B,  then  A indirectly  influences  C. 

This  captures  the  notion  of  direct  causal  influence,  i.e.,  a 
node  N and  those  nodes  that  are  directly  causally  affected  by 
N , using  a clan.  We  define  the  notion  of  the  clan  of  a node  N 
of  a DAG  Q in  terms  of  graphical  relationships  as  follows: 

3 In  other  system  description  specifications,  e.g.  [12],  these  struc- 
tural relations  are  captured  using  logical  sentences. 

4This  notion  of  causal  influence  does  not  guarantee  that  A influ- 
ences B,  but  that  A may  influence  B. 


Definition  2 (Clan)  ; Given  a DAG  Q,  the  clan  Y ( N,)  of  a 
node  Ni  € Q consists  of  the  node  N,  together  with  its  children 
in  Q. 

We  adopt  the  notion  of  clan  because  we  are  interested  in 
synthesizing  diagnoses  computed  at  a set  of  distributed  nodes 
organized  in  a tree  structure.  The  intuition  behind  the  algo- 
rithm is  as  follows:  given  local  diagnoses,  we  start  at  the  par- 
ents of  leaves  in  the  decomposition  tree  and  move  up  the  tree 
to  the  root,  identifying  if  any  node’s  diagnosis  is  affected  by 
the  diagnoses  of  its  children,  and  if  so,  synthesizing  those  di- 
agnoses. To  perform  each  synthesis  operation,  we  use  a clan. 

A clan  is  dual  to  the  well-known  notion  of  family,  which 
is  typically  defined  as  a node  together  with  its  parents  in  Q. 
This  notion  is  important  because  we  need  to  synthesize  local 
diagnostics  within  tree-structured  systems,  and  the  clan  pro- 
vides a more  efficient  means  for  doing  so  than  the  family  for 
tree-structured  systems.  For  simplicity  of  notation,  we  will 
denote  the  clan  for  node  Ni,  Y(Ni),  as  Yi. 

It  is  also  important  to  define  restrictions  of  subsets  of  ob- 
servables: 

Definition  3 (Restriction)  We  denote  by  0,  the  restriction  of 
an  instantiation  6 of  variables  V to  the  instantiation  of  a sub- 
set Vi  of  V . We  denote  the  restriction  of  variable  set  T to 
variables  in  sub- system  description  by  T*‘. 

One  of  the  key  elements  of  diagnosing  a system  is  the  in- 
stantiation of  observables,  since  a diagnosis  is  computed  for 
abnormal  observable  instantiations. 

Definition  4 (Instantiation)  O'1'1  is  an  instantiation  of  ob- 
servables Vobs®*  for  system  description  0®*  denotes  the 
set  of  all  instantiations  of  observables  Vobs^'  ■ 

We  specify  failure-mode  instantiations  and  partition  the 
possible  states  into  normal  states  and  faulty  states  as  follows: 

Definition  5 (Mode-Instantiation)  A*  is  an  instantiation  of 
behavior  modes  for  mode-set  A.  Further,  we  decomposition 
A*  such  that  A*  = AF  U A®,  where  A ® denotes  normal 
system  behaviour  i.e.  all  modes  are  normal,  and  AF  denotes 
a system  fault,  which  may  consist  of  simultaneous  faults  in 
multiple  components. 

An  assumable  (behavior-mode  variable)  specifies  the 
discrete  set  of  behavior-states  that  a component  can 
have,  e.g.,  and  AND-gate  can  be  either  OK,  stuck-at- 
0,  or  stuck-at-1.  Our  IFE-system,  with  component-set 
{Tx,  ABD\,  ADB2,  Pu,  -P12,  P21,  P22},  can  have  a mode- 
instantiation  in  which  all  components  are  OK  except  Pu, 
which  is  in  audio-fail  mode.  In  this  case  we  have  A'1'  = 
{Tx  - mode  = OK.  ABD\  - mode  = OK,  ADB2  - 
mode  = OK,  P\2—mode  = OK,  P2\  —mode  = OK,  P22  — 
mode  = OK}  and  AF  = {Pu  — mode  =audio-fail}. 

3.2  DISTRIBUTED  SYSTEM  DESCRIPTIONS 

This  section  describes  our  distributed  formalism,  which  ap- 
plies to  collections  of  interconnected  components,  or  blocks. 
We  assume  that  a distributed  system  description  is  provided 
either  by  the  user  or  is  deduced  from  the  physical  constraints 
of  available  local  diagnostic  agents  and  physical  connectiv- 
ity. For  example,  many  engineering  systems,  such  as  com- 
mercial aircraft,  are  subdivided  into  Line-Replaceable  Units 


(LRUs),  based  on  a number  of  factors,  such  as  fault-isolation 
capabilities,  physical  constraints,  and  ease  of  repair.  An  LRU 
typically  consists  of  a number  of  connected  sub-systems,  as 
in  the  Passenger  Unit  of  the  IFE  example,  which  consists  of 
circuit-cards  to  select  audio/video  channels  and  to  drive  the 
audio  and  video  output  devices.  It  is  standard  practice  in 
commercial  aircraft  to  isolate  faults  only  to  the  LRU-level, 
and  replace  faulty  components  only  at  the  LRU-level. 

Definition  6 (Decomposition  Function)  a decomposition 
function  is  a mapping  ip(&)  = <&dist  that  decomposes  a 
centralized  system  description  $ into  a distributed  system 
description  dist  = {$1, ...,  4>m}.  The  distributed  system 
description  induced  by  a decomposition  function  %[>  is  defined 
by  a decomposition  II  over  the  system  variables  V,  i.e.  a 

collection  X = {Xi Xm}  of  nonempty  subsets  ofV  such 

that  (1)  Vi  = 1,  ...,771,  Xt  € 2V;  (2)  V = UfiXfXi  e II). 
When  tfj  = Xi  fl  Xj  0,  we  call  fj  the  separating  set,  or 
sepset,  of  variables  between  and 

We  can  describe  a distributed  system  description  in  terms 
of  a decomposition  graph.  A decomposition  graph  is  a graph- 
ical representation  of  the  system  model,  when  viewed  as  a 
collection  of  connected  blocks.  In  this  graph  each  vertex  cor- 
responds to  a block,  and  each  directed  edge  corresponds  to 
a directed  (causal)  link  between  two  blocks.  Figure  2 shows 
the  decomposition  graph  for  the  extended  IFE  example. 5 

A decomposition  graph  is  a directed  tree,  or  D-tree,  which 
is  defined  as  follows: 

Definition  7 A D-tree  is  a directed  graph  with  vertices 
V{Tv)  with  a vertex  r 0,  called  the  root,  with  the  property 
that  for  every  vertex  r 6 V{Tp)  there  is  a unique  directed 
walk  from  rp  to  r. 

Definition  8 A decomposition  graph  Gx  is  an  edge-labeled 
D-tree  G{X,£,fi)  with  (1)  vertices  X = {Xi, ...,  Xm}, 
where  each  vertex  consists  of  a collection  of  variables  of  Q, 
(2)  directed  edges  join  pairs  of  vertices  with  non-empty  in- 
tersections, and  arc  direction  is  specified  by  the  causal  direc- 
tion of  the  arcs  between  blocks  in  the  decomposition  graph, 
i.e.,  £ = {(. Xj,Xk)\Xi  n Xj  0,  Xk  ex  Xj},  and  (3) 
edge  labels  (or  separators)  defined  by  the  edge  intersections, 

e={|y|xinxj/0}. 

We  assume  that  in  a distributed  system  description,  for  any 
block  all  sensor  data  is  local,  and  all  equations  describing  dis- 
tributed subsystems  refer  to  local  sensor  data  and  local  con- 
ditions. 

3.3  DIAGNOSIS  SPECIFICATION 

We  define  the  notion  of  diagnosis  as  follows: 

Definition  9 (Diagnosis)  Given  a system  description  $ with 
domain  axioms  E and  an  instantiation  9 ofX0bs,  a diagnosis 
D{9)  is  an  instantiation  of  behavior  modes  AF  U A®  such 
that  E U 9 U Af  U A®  |A  -L. 

5We  do  not  show  the  feedback  loops  of  control  requests 
(Ci,  Ci,  C11 ...,  C22)  since  all  edges  concerning  observables  can  be 
cut  [7], 


Figure  2:  Decomposition  graph  of  extended  IFE  system  de- 
scription. Here  an  oval  corresponds  to  a vertex,  and  a block 
corresponds  to  a sepset.  We  specify  the  variables  associated 
with  each  vertex  in  the  graph. 

This  diagnostic  framework  provides  the  capability  to  rank 
diagnoses  using  a likelihood  weight  m assigned  to  each  as- 
sumable A,  i = 1 Using  the  likelihood  algebra  de- 

fined in  [8],  we  can  compute  the  likelihood  assigned  to  each 
diagnosis  for  observation  9.  We  refer  to  a (diagnosis,  weight) 
pair  using  ( D{9 ),  k).  We  use  the  weights  to  rank  diagnoses, 
i.e.,  least-weight  diagnoses  are  the  most-likely.  This  provides 
a notion  of  minimal  diagnosis,  i.e.  a diagnosis  of  weight  k 
such  that  there  exists  no  lesser-weight  diagnosis. 

3.4  LOCAL/GLOBAL  DIAGNOSTICS 

Our  methodology  rests  on  the  determination  of  when  com- 
ponent diagnoses  are  independent,  in  which  case  the  global 
diagnosis  is  just  the  conjunction  of  the  component  diagnoses. 
We  apply  the  decomposition  theorem  of  [8]  to  this  case  of 
distributed  diagnostics: 

Theorem  1 If  we  have  a system  description  $ consisting  of 
two  component  system  descriptions  $1  and  <f>2.  and  a sys- 
tem observation  9,  if  the  variables  shared  by  <f>  1 and  $2  all 
appear  in  9,  then 

D*(9)  = D*1  (9i)  A D®2  (02 )• 

This  theorem  states  that  a diagnosis  is  decomposable  pro- 
vided that  the  system  observation  contains  the  variables 
shared  between  $1  and  $2-  However,  what  happens  when 
the  observation  9 does  not  contain  all  variables  shared  be- 
tween $1  and  $2?  One  solution  [8]  is  to  decompose  the  com- 
putation of  D’l>  by  performing  a case-analysis  of  all  shared 
variables  £12.  However,  this  case-analysis  approach  is  expo- 
nential in  £12 1,  the  number  of  variables  on  which  we  do  case- 
analysis.  Hence  if  we  wanted  to  embed  the  diagnostics  code, 
such  a case-analysis  might  be  too  time-consuming  when  per- 
formed on  a system-level  model. 

In  the  following  we  assume  that  each  component  computes 
a local  diagnosis,  i.e.,  a diagnosis  based  only  on  local  ob- 
servables and  on  equations  containing  only  local  variables.  In 
contrast  a global  diagnosis  is  one  based  on  global  observables 
and  on  equations  describing  all  system  variables.  Our  task  is 


to  integrate  these  local  component  diagnoses  into  a globally 
sound,  minimal  and  consistent  diagnosis,  since  for  many  sys- 
tems the  diagnostics  generated  locally  are  either  incomplete 
or  not  minimal. 

Note  that  we  can  obtain  global  diagnostics  for  a modular 
system  by  composing  local  blocks  and  diagnosing  the  entire 
system  model.  However,  it  is  true  in  many  cases  that  global 
and  local  diagnostics  may  differ.  We  now  define  a notion  of 
correspondence  between  local  and  global  diagnoses. 

The  conjunction  of  the  set  of  distributed  system  descrip- 
tions is  defined  as  DdiSt(9)  = A 4,  ,Gs  (9),  and  we  know 
that  Ddist  (9)  = D (9)  only  when  9 = (J  i , jf  : . 

We  can  compute  the  diagnoses  for  this  set  of  distributed 
system  descriptions  either  using  an  on-line  algorithm,  or  by 
pre-computing  the  set  of  diagnoses  for  Ddist{9).  In  the  fol- 
lowing, we  outline  the  compiled  method  of  diagnosis. 

We  define  a table,  called  a clan  table,  to  specify  local  and 
global  diagnoses  for  collections  of  blocks.  This  table  com- 
piles the  local  case-analysis  required  by  Theorem  1 . We  will 
show  later  how  to  use  this  table  for  our  diagnosis  synthesis 
algorithm. 

Definition  10  A clan  (or  local/global  diagnosis)  table  for 

block-set  B = {<!>,.  •••'Fj}  is  a table  consisting  of  tuples 
(observable-intantiation,  global  diagnosis,  weight)  for  all  ab- 
normal instantiations  of  observables  9 in  B. 

Note  that  we  can  use  the  compositionality  of  blocks  to 
show  that  any  time  we  compose  a system  description  from 
multiple  blocks,  we  obtain  “global”  diagnostics  for  that  com- 
posed system  description  when  we  compute  diagnoses  over 
the  composed  system  description.  Hence  the  “global”  diag- 
nosis for  each  collection  of  blocks  is  computed  from  a system 
description  generated  from  the  composition  of  the  system  de- 
scriptions of  the  blocks  in  B,  using  the  observables  from  B. 

Example  1 Table  1 contrasts  the  local  and  global  diagnoses 
for  a set  of  scenarios  where  the  set  B of  blocks  is  an  ADB 
with  downstream  passenger  units.  In  these  scenarios,  we 
compute  the  (probabilistically)  most-likely  diagnosis,  assum- 
ing that  all  faults  are  equally  likely,  i.e.,  have  weight  1 . More- 
over, in  defining  a local  diagnosis  in  Table  1,  we  report  the 
conjunction  of  all  local  diagnoses,  i.e.  the  local  diagnosis  is 
ADB-diagnosis  A Pi-diagnosis  A P\-diagnosis.  In  scenarios 
1 , 2 and  4,  the  local  and  global  diagnoses  are  identical.  How- 
ever, in  scenarios  3,  5 and  6,  they  differ:  the  passenger  units 
each  assume  a local  fault,  whereas  the  transmitter  unit  is  the 
faulty  one  (since  a single  transmitter  fault  is  much  more  likely 
the  two  simultaneous  faults,  one  in  each  passenger  unit). 6 

Given  this  potential  for  discrepancy  between  local  and 
global  diagnoses,  we  map  the  decomposition  graph  into  a 
representation,  the  clan  graph,  from  which  we  can  synthesize 
globally  sound  and  complete  minimal  diagnoses  from  local 
minimal  diagnoses.  Figure  3 shows  the  clan  graph  for  the 
extended  IFE  example. 

‘’These  differences  arise  due  to  different  instantiations  of  the  RF 
signal  in  the  local  and  global  diagnosis.  We  hide  the  details  of  the 
case-analysis  of  shared  variables  for  simplicity  of  presentation. 


Scenario 

ADB 1 Unit 

Pass.  Unit  11 

Pass.  Uniti2 

Diagnosis 

C11 

C12 

Su 

VD  n 

S 12 

VD 1, 

LOCAL 

GLOBAL 

1 

audio 

audio 

nom. 

none 

nom. 

none 

- 

- 

2 

audio 

audio 

none 

none 

nom. 

none 

Pi  1 -audio-fail 

Pi  1 -audio-fail 

3 

audio 

audio 

none 

none 

none 

none 

Pi  1 -audio-fail A Pi  2 -audio-fail 

Xaudio 

4 

video 

video 

nom. 

nom. 

nom. 

none 

P 12-video- fail 

Pi2-video-fail 

5 

video 

video 

nom. 

none 

nom. 

none 

Pi  1 -video-fail A Pi  2-video-fail 

Xvideo 

6 

audio 

video 

none 

none 

none. 

none 

Pi  1 -audio- fail  A Pi  2 -video-fail 

AD  Bi- fail 

Table  1 : Diagnostic  Scenarios.  We  denote  a nominal  passenger  output  of  nominal  using  nom.,  and  abnormal  observable  data  in 
bold-face.  Xaudio  denotes  degraded  audio,  and  Xvideo  denotes  degrated  video. 


Figure  3:  Clan  graph  of  extended  IFE  system  description. 

Definition  11  (Clan  graph)  : A clan  graph  Gy  of  a DAG 

Q(  V.  E)  of  vertices  V and  edges  E is  an  edge-labeled  D-tree 
G(y,£f)  defined  as  follows:  (1 ) vertices  y = [Y\ , . . . , Ym  }, 
where  each  node  Yi  consists  of  a clan  of  Q;  (2)  edges  de- 
fined by  non-empty  intersections  between  pairs  of  vertices 
£ = {(Yj,  Yk)\Yi  fl  Yj  f 0};  and  (3)  separators  defined 
by  the  edge  intersections  £ = {£y  = Yi  fl  Yj}. 

The  following  section  shows  how  we  use  the  clan  graph  for 
distributed  diagnosis. 

4 DISTRIBUTED  MODEL-BASED 
DIAGNOSIS 

This  section  describes  our  distributed  model-based  diagnosis 
algorithm.  We  first  map  the  directed  graph  of  the  system  into 
a tree  using  tree-decomposition  techniques,  and  then  employ 
a message-passing  algorithm  on  the  tree. 

4.1  TREE-DECOMPOSITION 

The  work  on  tree-decomposition  stems  from  work  on 
treewidth  and  graph  minors  [23].  A good  review  of  the  liter- 
ature can  be  found  in  [5].  We  define  the  basic  notions  below. 

Definition  12  A tree  decomposition  of  an  undirected  graph 
G = (V,E)  is  a pair  (X,T)  with  T = (I,F)  a tree,  and 
X = {Xi\i  G 1}  is  a family  of  subsets  of  V , one  for  each 
node  ofT,  such  that 

1.  \JieIXi  = V; 

2.  for  all  edges  { v , w}  € E there  exists  an  i G I with 
v G Xi  and  w G Xt,  and 

J.  for  all  i,j,k  G I if j is  on  the  path  from  i to  k in  T,  then 
Xi  n Xk  C Xj.  ' 

The  last  property  is  known  as  the  running-intersection  prop- 
erty within  the  BN  community.  The  clique-tree  algorithm 


computes  a tree-decomposition  in  which  each  node  of  the 
tree  is  a clique,  and  undirected  edges  correspond  to  shared 
variables  between  cliques. 

Given  a tree-decomposition,  inference  complexity  is  based 
on  the  treewidth,  defined  as  follows.  The  width  of  a tree  de- 
composition is  max,,;/  \Xi\  — 1.  The  treewidth  of  a graph  G 
is  the  minimum  width  over  all  tree  decompositions  of  G.  The 
treewidth  bears  close  relations  to  the  maximal  vertex  degree 
and  maximal  clique  of  a graph,  so  it  provides  a measure  of 
the  complexity  of  diagnostic  inference,  among  other  things. 
If  a graph  has  a low  treewidth  then  inference  on  the  graph 
is  guaranteed  to  be  easy.  The  task  of  computing  treewidth  is 
NP-hard  [2],  Many  algorithms  exist  that,  given  a graph  with  n 
variables,  will  compute  an  optimal  treewidth  in  time  polyno- 
mial in  n but  exponential  in  the  treewidth  see,  for  example, 
[4], 

Directed  Tree-Decomposition 

The  difference  between  the  standard  literature  on  tree- 
decompositions  and  the  task  addressed  here  is  that  the  stan- 
dard literature  focuses  on  undirected  graphs,  and  we  focus  on 
directed  graphs.  We  capture  and  exploit  the  directionality  of 
causal  relations  during  all  phases  of  diagnostic  inference.  For 
example,  if  we  have  an  abstract  hierarchical  specification  of 
a system  and  compute  diagnostics  for  each  abstract  hierar- 
chical block,  we  still  preserve  the  directionality  of  causality 
among  the  abstract  blocks.  We  exploit  this  directionality  us- 
ing a diagnostic  synthesis  algorithm  operating  on  a directed 
tree. 

Definition  13  A D-tree  Tv  is  a directed  graph  with  vertices 
Vti,  and  a vertex  Vo,  called  the  root,  with  the  property  that 
for  every  vertex  V G Vrc  there  is  a unique  directed  walk  from 

V0  to  V. 

The  tree-decomposition  results  have  been  generalized  to 
directed  graphs  in  [16],  and  we  make  use  of  some  of  those 
results  here.  The  key  change  is  that  we  need  to  preserve  or- 
dering of  edges  during  the  decomposition  process.  To  capture 
such  properties,  we  first  need  to  define  a notion  of  variable  or- 
dering, called  Z -normality. 

Definition  14  Let  Q be  a digraph  and  let  Z C V.  Aset  S is  Z- 
normal  if  and  only  if  the  vertex-sets  of  the  strong  components 
ofQ\Z  can  be  numbered  Si, S2,  ■■■, Sd  such  that 

1.  ifl<i<j<d,  then  no  edge  of  Q has  a head  in  Si 
and  tail  in  Sj,  and 

2.  either  S = 0 or  S = Si  U Si+ 1 ■ ■ • U Sj  for  some  integers 
i,  j with  1 < i < j < d. 


Definition  15  A D-tree  decomposition  of  a digraph  Q = 
(V,  £)  is  a pair  (X,  Tx>)  with  Tx>  = (T,  T)  a D-tree,  and 
X = {Xi\i  e 1}  is  a family  of  subsets  ofV,  one  for  each 
node  ofTx>,  and  the  edges  are  numbered  J = { 1, ....  I } with 
T = {Fj  : j 6 J},  such  that 

1.  Uigi  Xi  = V; 

2.  for  all  edges  {v,  w}  6 E there  exists  an  i 6 1 with 
v € Xi  and  w € Xi:  and 

3.  for  all  i,  j,k  € T if  j is  on  the  path  from  i to  k in  T-p, 
then  Xi  fl  Xk  C Xp 

4-  if  j £ 3 - then  [J^X,  : i € T,  i > j}  is  Xj-normal. 

The  width  of  a tree  decomposition  is  the  least  integer  w such 
that  for  all  i € 3,  \Xi  U U Xj  \ < w + 1,  where  the  union  is 
taken  over  all  edges  j e J incident  with  i.  rnaxjgx  \Xi\  — 1. 
The  treewidth  of  a graph  Q is  the  least  integer  w such  that  Q 
has  a D-tree-decomposition  of  width  w. 

For  the  class  of  applications  addressed  in  this  article,  the 
input  graphs  Q for  the  system  description  are  digraphs,  and 
the  decomposition  graph  and  clan  graph  are  both  D-tree  de- 
compositions of  Q.  For  more  general  digraph  topologies,  by 
applying  an  algorithm  for  generating  D-tree  decompositions, 
we  can  convert  the  digraphs  into  a decomposition  graph,  and 
apply  the  diagnostic  synthesis  approach.  Many  of  the  prop- 
erties of  undirected  tree-decompositions  hold  for  the  directed 
case  [16], 

4.2  DIAGNOSIS  OF  SYSTEMS  WITH 
TREE-STRUCTURED  GRAPHS 

We  now  describe  an  approach  to  diagnosing  systems  with 
tree-structured  decomposition  graphs. 

We  assume  that: 

• We  are  provided  with  the  component  system  descrip- 
tions and  their  connectivity; 

• There  is  a single  root  in  the  decomposition  graph  (which 
is  a component  with  no  parent-components),  and  each 
leaf  is  a component  with  no  child-component; 

• Nodes  have  indices  starting  at  the  root  (Xi),  increas- 
ing based  on  a breadth-first  expansion  from  the  root  and 
ending  at  the  leaves,  labeled  Xn-S, ....  Xn; 

• Each  component  computes  a local  diagnosis  based  on 
local  observables. 

We  base  our  approach  on  synthesizing  diagnoses,  starting 
from  the  leaf  components  and  ending  up  at  the  root  of  the 
tree.  We  first  decompose  the  decomposition  graph  into  a clan 
graph.  Based  on  the  clan  graph  we  construct  a clan  table  for 
each  node  in  the  graph. 

This  algorithm  is  inspired  by  the  Bayesian  network  clique- 
tree  approach  of  [17],  but  replaces  the  clique-tree  with 
an  analogous  clan-tree,  and  passes  diagnoses  as  messages. 
Analogous  to  the  clique -tree  method’s  clique-table  pre- 
computation,  this  approach  requires  pre-computing  clan- 
tables,  but  for  embedded  systems  this  results  in  computation- 
ally simpler  algorithms  than  those  adopted  in  the  past. 

Under  this  scheme,  we  pre-compute  clan  tables  for  each 
clan  in  Qy.  Given  an  observation  9 for  blocks  Xi,...,Xk, 


where  Xi, ....  Xk  are  members  of  a clan  Y e Qy,  each  block 
computes  diagnostics  locally.  We  then  compute  the  most 
likely  fault-mode  assignment  for  Y through  a process  we  call 
diagnostics  synthesis,  which  entails  table-lookup  in  the  clan 
table  of  the  minimal  diagnosis  given  9.  The  algorithm  synthe- 
sizes final  diagnoses,  going  from  the  leaves  to  the  root.  This 
guarantees  a sound,  complete  and  globally  minimum  system 
diagnosis. 

In  this  approach  we  first  need  to  pre-compute  the  clan  table, 
and  then  use  that  table  for  diagnostic  synthesis.  We  can  pre- 
compute the  clan  table  from  a set  of  blocks  i } as 

follows: 

1.  Generate  the  decomposition  graph  Gx  from 

with  indices  increasing  in  a breadth- 
first  manner  from  the  root. 

2.  Generate  the  clan  graph  Gy  of  Gx- 

3.  Compute  the  clan  table  for  each  clan  Y;  in  Gy. 

Given  an  observation  9,  the  diagnostic  synthesis  algorithm 
is  as  follows: 

1.  Given  observation  9,  each  block  B,  computes  its  local 
diagnosis  (9)  and  likelihood 

2.  Mark  all  nodes  X,,  i = 1, ...,  n with  flag=0; 

3.  Loop  for  j = n to  1 : 

(a)  If  flag=0  for  Xj  do: 

For  each  node  X,  in  the  clan  Y(Xj),  look  up 

corresponding  clan  diagnosis  D®Y  (9)  and  weight 

k(D<1>y  ( 9 ))  in  the  clan-table; 

If  k(D®y  (9))  < J2 

k-A ?k€Y 

• revise  fault-mode  assignment  to  nodes  in  Y ( Nj ) , 
by  (a)  setting  the  minim  run-weight  diagnosis 
mode- variable;  (b)  if  any  local  diagnosis  D'  is 

. ?^gn?a1ueltovanables  in  Y based  on  D and 

Q 

• if  reassignment  is  sound  pass  message  with  fault 

. e Y (Xj ) to  1; 

Theorem  2 Given  a tree-structured  decomposition  graph 
Qx  and  local  component  diagnoses,  diagnostics  synthesis 
will  compute  a sound  and  globally  consistent  set  of  fault 
mode  assignments  for  components  X 6 Gx  within  0(|Y|) 
message-passing  steps,  where  Qy  is  the  clan  graph  generated 
from  Qx- 

Example  2 Diagnosis  Synthesis  in  a Clan:  Consider  Sce- 
nario 3 of  Table  1 . For  this  observation  9,  the  total  set  of 
possible  clan  diagnoses  is:  (Pn,  audio-fail)  A (P12,  audio- 
fail) V (ADBi,Xaudio).  The  weights  of  the  diagnoses  are  2 
and  1 , respectively. 

In  computing  diagnoses  on  a purely  local  basis,  the  result- 
ing diagnosis  is  (Pn,  audio-fail)  A (P12,  audio-fail),  with 
weight  2.  Note  however  there  is  a family  diagnosis  of  weight 
1,  (ADB 1,  Xauclio),  which  is  selected  since  it  is  of  lower 
weight  than  the  distributed  diagnosis.  We  now  instantiate 
each  local  component  with  9,  and  set  diagnoses  as  follows: 
(P11,  0),  (P12,  0),  (ADBi,Xaudio).  There  exists  a consistent 
set  of  local  variable  instantantiations  for  this  assignment,  so 
no  further  message-passing  is  necessary. 


Local  Dx 
Family  Dx 

Local  Dx 
Family  Dx 


Figure  4:  Diagnosis  synthesis  procedure,  Step  1:  (a)  local 
diagnoses  synthesized  at  clans,  and  (b)  clan  diagnoses  are 
passed  between  families,  as  noted  by  dark  arrows. 


Example  3 Message-Passing:  Figure  4 shows  the  first  stage 
of  this  procedure.  In  the  graph  we  show  nodes  where  the  vari- 
ables are  restricted  to  fault  mode  variables,  to  simplify  the 
description  of  message-passing  of  instantations  of  mode  vari- 
ables. First,  the  local  diagnoses  are  computed  at  each  node 
in  the  decomposition  graph:  all  four  passenger  units  register 
a fault,  and  no  other  nodes  in  the  decomposition  graph  reg- 
ister faults.  As  a shorthand,  we  denote  a fault-weight  pair 
using  variable-names  for  faults,  with  0 denoting  a nominal 
mode.  Then,  these  faults  are  synthesized  at  each  clan  using 
the  clan-table:  fault-weight  pair  (Pn  /\P\2,  2)  is  synthesized 
into  (ADBi,  1),  and  fault  (P2i  A P22 , 2)  is  synthesized  into 
(ADB2,  1).  Second,  the  synthesized  faults  (ADB 1,  1)  and 
(ADB2,  1)  are  sent  to  the  adjacent  node  in  the  clan  graph, 

Vi. 


Local  Dx 
Family  Dx 
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Family  Dx 


Figure  5:  Diagnosis  synthesis  procedure.  Step  2:  global  diag- 
noses computed  following  family  diagnosis  message-passing. 


Figure  5 shows  the  second  stage  of  this  procedure.  Fault- 
weight  pair  {ADBi  AADB2,  2)  is  synthesized  into  (Tx,  1) 
at  clan  and  all  other  fault-modes  are  set  to  nominal.  This 
is  the  global  minimum- weight  fault. 

4.3  COMPLEXITY  ISSUES 

The  complexity  of  logical  resolution  within  a distributed 
framework  have  been  discussed  in  [1],  Here,  our  task  is 
model-based  diagnosis  within  a tree-structured  topology. 

This  approach  is  based  on  computing  diagnoses  for  the 
clans  of  Q.  Hence,  it  never  needs  to  diagnose  a system  de- 
scription for  the  entire  graph  Q,  but  only  for  the  clans  of  Q. 
As  noted  in  Theorem  2,  once  the  clan  tables  are  computed, 
given  any  local  component  diagnoses,  the  algorithm  is  linear 
in  the  number  of  nodes  in  the  clan-graph. 


The  worst-case  complexity  of  computing  a clan  table  is  ex- 
ponential in  the  number  of  variables  in  the  clan  table.  The 
memory  requirements  for  storing  the  clan  tables  are  defined 
as  follows.  In  the  worst  case,  for  a clan  with  mode  vari- 
ables Ai,...,  Am,  where  each  mode  variable  has  \u>  a,  | faulty 
values,  a clan  table  stores  an  entry  for  each  of  the  x 
multiple-fault  combinations.  For  single-fault  scenarios,  a clan 
table  must  store  only  JA  j loa,  j entries. 

The  main  issue  is  the  time-complexity  of  generating  the 
clan  tables.  For  tree-structured  systems  the  complexity  of  di- 
agnosing Q is  exponential  in  the  clan  size,  and  the  complexity 
is  bounded  by  the  largest  clan  of  Q.  Hence  the  complexity  of 
initially  computing  diagnoses  is  the  same  for  the  centralized 
and  distributed  approaches.  However,  for  embedded  applica- 
tions, the  distributed  approach  has  a complexity  advantage, 
since  only  clan-table  lookup  and  simple  message-passing  are 
required. 

5 RELATED  WORK 

Our  approach  to  distributed  diagnosis  has  been  preceded  by 
many  pieces  of  related  work,  and  we  review  several  here. 
Note  that  this  review  examines  the  most  relevant  work,  and 
does  not  claim  to  be  exhaustive. 

One  of  the  most  closely-related  pieces  of  work  describes 
techniques  for  distributed  logical  inference  [1 ; 20],  This  work 
focuses  on  how  to  perform  logical  reasoning  and  query  an- 
swering, proposing  sound  and  complete  message  passing  al- 
gorithms, by  exploiting  the  tree  structure  of  distributed  theo- 
ries. They  examine  the  complexity  of  computation,  propose 
specialized  algorithms  for  first-order  resolution  and  focused 
consequence  finding,  and  propose  algorithms  for  optimally 
partitioning  a theory  that  is  not  already  distributed.  In  some 
ways,  our  task  can  be  considered  a special  case  of  the  general 
problem  that  Amir  and  Mcllraith  examine.  Logical  inference 
computes  a model,  whereas  diagnostic  inference  computes  a 
minimal  model  in  the  assumables,  a subset  of  the  language 
of  the  theory.  We  leverage  many  aspects  of  the  specific  diag- 
nosis problem  in  our  work,  aspects  that  serve  to  distinguish 
both  our  approach  and  our  results.  These  include  the  notion 
of  causality,  which  imposes  a directionality  on  the  tree  struc- 
ture and  the  inference,  and  the  notion  of  preference.  In  ad- 
dition, the  task  of  diagnostic  inference  depends  critically  on 
two  classes  of  distinguished  variables,  assumables  (the  liter- 
als of  interest)  and  observables  (the  inputs),  and  distributed 
diagnosability  depends  on  how  assumables  and  observables 
are  distributed  among  the  collection  of  blocks.  In  addition, 
if  the  variables  common  between  two  blocks  are  observable, 
then  from  a distributed  diagnostics  point  of  view  those  blocks 
are  independent  [7]. 

The  approach  presented  here  bears  some  relation  to  diag- 
nostic approaches  on  trees.  Stumptner  and  Wotawa  [25]  have 
an  algorithm  for  diagnosing  tree-structured  systems.  This  ap- 
proach assumes  a centralized  system  defined  at  the  compo- 
nent level  whereas  our  approach  deals  with  distributed  sys- 
tems that  can  be  defined  at  any  level  of  abstraction.  In  ad- 
dition, our  assumption  of  sub-systems  computing  their  own 
diagnoses  means  that  our  diagnostic  synthesis  process  is  a 
single-pass  algorithm  from  the  leaves  of  the  tree  to  the  root, 


whereas  Stumptner  and  Wotawa  need  a two-pass  approach 
since  they  must  first  enumerate  all  component  diagnoses.  A 
second  major  tree-based  method  uses  a clique-tree  decom- 
position of  a system,  e.g.,  the  diagnostic  method  of  [13],  A 
clique-tree  is  a representation  that  is  used  for  many  kinds  of 
inference  in  addition  to  diagnosis,  including  probabilistic  in- 
ference and  constraint  satisfaction.  The  tree  we  generate  is  a 
directed  tree  with  a fixed  root,  and  the  nodes  of  the  tree  are 
generated  based  on  the  clan  property;  a clique-tree  is  undi- 
rected (with  an  arbitrary  root),  and  the  nodes  of  the  tree  are 
generated  based  on  the  family  property.  One  can  think  of 
the  D-tree  as  a directed  variant  of  a clique-tree,  which  is  op- 
timized for  diagnostic  inference.  In  addition,  our  approach 
uses  the  ordering  of  the  D-tree  to  require  message-passing  in 
a single  direction  only;  in  contrast,  message  propagation  in 
clique  trees  is  bi-directional. 

Our  work  also  bears  some  relation  to  papers  describing  dis- 
tributed solutions  to  Constraint  Satisfaction  Problems  (CSPs) 
[26;  15].  As  with  the  work  on  distributed  logical  inference 

[1],  the  task  of  distributed  CSPs  is  finding  a satisfying  as- 
signment to  the  variables,  when  constraints  are  distributed  in 
a collection  of  subsets  of  constraints.  Hence  the  underlying 
tasks  of  distributed  diagnosis  and  CSP  satisfiability  are  dif- 
ferent. One  issue  in  this  work  that  is  similar  to  diagnostic 
reasoning  is  the  recording  of  minimal  sets  of  unsatisfiable 
clauses  as  nogoods  [15].  The  computation  of  nogoods  is  a 
key  step  to  computing  diagnoses  [10]. 

There  have  been  several  proposals  for  using  the  ATMS  [9] 
in  a distributed  manner,  e.g.,  [11;  19;  3;  18].  Our  approach 
differs  from  this  work  in  that  our  approach  uses  system  topol- 
ogy explicitly,  whereas  these  other  approaches  do  not  make 
as  extensive  a use  of  topology. 

The  compilation  approach  proposed  in  this  article  bears 
some  relation  to  prior  work. 7 [24]  presents  an  empirical  com- 
parison of  centralized  compilation  techniques  as  applied  to 
several  areas,  of  which  diagnosis  is  one.  Our  future  work  in- 
cludes examining  the  applicability  of  these  compilation  tech- 
niques within  our  distributed  framework.  Compilation  is  also 
examined  in  [20],  but  (as  mentioned  earlier)  as  applied  to  a 
different  task,  logical  resolution. 

There  has  been  some  prior  work  on  distributed  model- 
based  diagnosis.  For  example,  the  approach  in  [14]  assumes 
that  the  diagnosis  computed  by  each  distributed  agent  is  glob- 
ally correct,  and  examine  the  case  where  agents  must  coop- 
erate to  diagnose  components  whose  status  is  unknown.  Our 
approach  makes  the  more  realistic  assumption  that  diagnoses 
are  not  necessarily  globally  sound,  and  derives  a very  differ- 
ent global  synthesis  algorithm. 

6 SUMMARY  AND  CONCLUSIONS 

This  document  has  described  a mechanism  for  computing  dis- 
tributed diagnoses  using  system  topology  and  observability 
properties.  This  algorithm  takes  as  input  minimal  diagnoses 
computed  within  distributed  components,  and  uses  system 
topology  to  integrate  these  diagnoses  into  a globally  sound 
and  minimal  system  diagnosis. 

7A  review  of  compilation  can  be  found  in  [6], 


We  are  in  the  process  of  applying  this  approach  to  two  real- 
world  domains,  that  of  In-Flight  Entertainment  and  diagnosis 
of  H VAC  systems. 

The  approach  presented  here  provides  a mechanism  for 
designing  systems  with  predictable  distributed  diagnostics 
properties.  A given  decomposition  graph  can  be  rated  accord- 
ing to  its  diagnosability  and  efficiency.  Additionally,  given  a 
system  description,  we  can  apply  D-tree  decomposition  al- 
gorithms to  the  system  DAG  to  assist  in  identifying  small- 
treewidth  decompositions,  if  any  exist.  Further,  if  a system 
has  no  small  treewidth  decomposition,  one  can  then  recom- 
mend system  re-design  to  be  facilitate  efficiently  computing 
distributed  diagnoses. 
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